NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

https://doi.org/10.1613/jair.1.17541

Winata, Genta Indra; Zhao, Hanyang; Das, Anirban; Tang, Wenpin; Yao, David D; Zhang, Shi-Xiong; Sahu, Sambit (January 2025, Journal of Artificial Intelligence Research)

Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth exploration of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area. Additionally, we provide a GitHub link https://github.com/hanyang1999/Preference-Tuning-with-Human-Feedback.
more » « less
Full Text Available
Polynomial Voting Rules

https://doi.org/10.1287/moor.2023.0080

Tang, Wenpin; Yao, David D. (February 2024, Mathematics of Operations Research)

We propose and study a new class of polynomial voting rules for a general decentralized decision/consensus system, and more specifically for the proof-of-stake protocol. The main idea, inspired by the Penrose square-root law and the more recent quadratic voting rule, is to differentiate a voter’s voting power and the voter’s share (fraction of the total in the system). We show that, whereas voter shares form a martingale process that converges to a Dirichlet distribution, their voting powers follow a supermartingale process that decays to zero over time. This prevents any voter from controlling the voting process and, thus, enhances security. For both limiting results, we also provide explicit rates of convergence. When the initial total volume of votes (or stakes) is large, we show a phase transition in share stability (or the lack thereof), corresponding to the voter’s initial share relative to the total. We also study the scenario in which trading (of votes/stakes) among the voters is allowed and quantify the level of risk sensitivity (or risk aversion) in three categories, corresponding to the voter’s utility being a supermartingale, a submartingale, and a martingale. For each category, we identify the voter’s best strategy in terms of participation and trading. Funding: W. Tang gratefully acknowledges financial support through the National Science Foundation [Grants DMS-2113779 and DMS-2206038] and through a start-up grant at Columbia University. D. D. Yao’s work is part of a Columbia–City University/Hong Kong collaborative project that is supported by InnoHK Initiative, the Government of Hong Kong Special Administrative Region, and the Laboratory for AI-Powered Financial Technologies.
more » « less
Full Text Available
Policy optimization for continuous reinforcement learning

Zhao, Hanyang; Tang, Wenpin; Yao, David D (February 2024, Advances in neural information processing systems)

Full Text Available
Trading under the proof‐of‐stake protocol – A continuous‐time control approach

https://doi.org/10.1111/mafi.12403

Tang, Wenpin; Yao, David D. (May 2023, Mathematical Finance)

Abstract We develop a continuous‐time control approach to optimal trading in a Proof‐of‐Stake (PoS) blockchain, formulated as a consumption‐investment problem that aims to strike the optimal balance between a participant's (or agent's) utility from holding/trading stakes and utility from consumption. We present solutions via dynamic programming and the Hamilton–Jacobi–Bellman (HJB) equations. When the utility functions are linear or convex, we derive close‐form solutions and show that the bang‐bang strategy is optimal (i.e., always buy or sell at full capacity). Furthermore, we bring out the explicit connection between the rate of return in trading/holding stakes and the participant's risk‐adjusted valuation of the stakes. In particular, we show when a participant is risk‐neutral or risk‐seeking, corresponding to the risk‐adjusted valuation being a martingale or a sub‐martingale, the optimal strategy must be to either buy all the time, sell all the time, or first buy then sell, and with both buying and selling executed at full capacity. We also propose a risk‐control version of the consumption‐investment problem; and for a special case, the “stake‐parity” problem, we show a mean‐reverting strategy is optimal.
more » « less

Search for: All records